-
Notifications
You must be signed in to change notification settings - Fork 3.8k
chore: consistent expressionEvaluator.eval result memory ownership #19438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I disabled auto-merge as I have a few small nits, the biggest of which I'd like to move the defer x.Release() closer toe the creation of x, more open/close style, where possible.
| defer func() { | ||
| for _, a := range arrays { | ||
| a.Release() | ||
| } | ||
| }() | ||
|
|
||
| return array.NewRecord(schema, arrays, ct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the defer Release approach when it's in an open/close situation, so for example immediately after creating an array. So, how would you feel about either a) moving this up into the builders loop above, so
for i, builder := range builders {
arrays[i] = builder.NewArray()
defer arrays[i].Release()
}
or b) storing the new record in a variable and then releasing synchronously (ie. not in a defer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an existing pattern that I followed. I am not a fan of doing that without defer because there is risk of missing important cleanup in case of early returns or failures. Using defer is as simple and reliable as finally in Java.
| return failureState(err) | ||
| } | ||
| fields = append(fields, arrow.Field{Name: columnNames[i], Type: vec.Type().ArrowType(), Metadata: types.ColumnMetadata(vec.ColumnType(), vec.Type())}) | ||
| projected = append(projected, vec.ToArray()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do the defer release here, so it's closer to it's creation, ie. open/close style
| return nil, fmt.Errorf("unsupported datatype for partitioning %s", vec.Type()) | ||
| } | ||
|
|
||
| arrays = append(arrays, vec.ToArray().(*array.String)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, can we do the release here?
| return nil, fmt.Errorf("unsupported datatype for grouping %s", vec.Type()) | ||
| } | ||
|
|
||
| arrays = append(arrays, vec.ToArray().(*array.String)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
|
||
| t.Run("filter with true literal predicate", func(t *testing.T) { | ||
| alloc := memory.DefaultAllocator | ||
| alloc := memory.NewCheckedAllocator(memory.NewGoAllocator()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we can't use memory.DefaultAllocator instead of instantiating a new GoAllocator()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be inconsistent between test files and GoAllocator was the one I randomly copy-pasted from. I fixed all test files to use DefaultAllocator.
…i into spiridonov-eval-arrow-memory
What this PR does / why we need it:
I found an inconsistent behavior in records returned from
eval. If there is a mathematical expression, then it returns a new column and the caller must release it. For predicates (infilter.go) it returns a new column if a predicate is non-trivial. If a predicate is just aColumnRef(for a boolean column) it will return this column as-is, and releasing this column will cause panic (too many releases). So I added an explicitRetain()on all nodes of an expression, so now it is consistent and the caller must callRelease()on all records fromeval()without worrying was it a column reference or an expression.I started by adding
memory.CheckedAllocatorto all tests and that's where I found leaks in the first place.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR